Learning Unbiased Stochastic Edit Distance in the form of a Memoryless Finite-State Transducer∗
نویسندگان
چکیده
We aim at learning an unbiased stochastic edit distance in the form of a finite-state transducer from a corpus of (input,output) pairs of strings. Contrary to the other standard methods, which generally use the algorithm Expectation Maximization, our algorithm learns a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimize the parameters of a conditional transducer instead of a joint one. This transducer can be very useful in many domains of pattern recognition and machine learning, such as noise management, or DNA alignment. Several experiments are carried out with our algorithm showing that it is able to correctly assess theoretical target distributions.
منابع مشابه
Using Learned Conditional Distributions as Edit Distance
In order to achieve pattern recognition tasks, we aim at learning an unbiased stochastic edit distance, in the form of a finite-state transducer, from a corpus of (input,output) pairs of strings. Contrary to the state of the art methods, we learn a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimize the par...
متن کاملStochastic Contextual Edit Distance and Probabilistic FSTs
String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) defined stochastic edit distance—a probability distribution p(y | x) whose parameters can be trained from data. We generalize this so that the probability of choosing each edit operation can depend on contextual features. We show how to construct and train a probabilistic finite-...
متن کاملLearning Conditional Transducers for Estimating the Distribution of String Edit Costs
We focus on the Edit Distance and propose an algorithm to learn the costs of the primitive edit operations. The underlying model is a probabilitic transducer computed by using grammatical inference techniques, that is neither deterministic nor stochastic in the standard terminology. Moreover, this transducer is conditional, thus independent from the distributions of the input strings. Real worl...
متن کاملLearning stochastic finite-state transducer to predict individual patient outcomes
The high frequency data in intensive care unit is flashed on a screen for a few seconds and never used again. However, this data can be used by machine learning and data mining techniques to predict patient outcomes. Learning finite-state transducers (FSTs) have been widely used in problems where sequences need to be manipulated and insertions, deletions and substitutions need to be modeled. In...
متن کاملFinite Growth Models
Finite growth models (FGM) are nonnegative functionals that arise from parametrically-weighted directed acyclic graphs and a tuple observation that aaects these weights. The weight of a source-sink path is the product of the weights along it. The functional's value is the sum of the weights of all such paths. The mathematical foundations of hidden Markov modeling (HMM) and expectation maximizat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005